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Abstract 

In query learning, the goal is to identify an unknown object while minimizing the number of "yes" or 
"no" questions (queries) posed about that object. A well-studied algorithm for query learning is known 
as generalized binary search (GBS). We show that GBS is a greedy algorithm to optimize the expected 
number of queries needed to identify the unknown object. We also generalize GBS in two ways. First, 
we consider the case where the cost of querying grows exponentially in the number of queries and the 
goal is to minimize the expected exponential cost. Then, we consider the case where the objects are 
partitioned into groups, and the objective is to identify only the group to which the object belongs. We 
derive algorithms to address these issues in a common, information-theoretic framework. In particular, 
we present an exact formula for the objective function in each case involving Shannon or Renyi entropy, 
and develop a greedy algorithm for minimizing it. Our algorithms are demonstrated on two applications 
of emery learning, active learning and emergency response. 

1 Introduction 

In query learning there is an unknown object 9 belonging to a set = {#!,••• ,9 m} of M different 
objects and a set Q = {q%, • • • , qw} of N distinct subsets of known as queries. Additionally, the vector 
IT = (7ri,--- ,ttm) denotes the a priori probability distribution over 0. The goal is to determine the 
unknown object 9 £ through as few queries from Q as possible, where a query q £ Q returns a value 
1 if 9 6 q, and otherwise. A query learning algorithm thus corresponds to a binary decision tree, 
where the internal nodes are queries, and the leaf nodes are objects. The above problem is motivated 
by several real- world applications including fault testing [IJ [2] , machine diagnostics [3] , disease diagnosis 
[HE], computer vision [6] and active learning [3|8]. Algorithms and performance guarantees have been 
extensively developed in the literature, as described in Section |1.1| below. We also note that the above 
problem is known more specifically as query learning with membership queries. See [9] for an overview of 
query learning in general. 

As a motivating example, consider the problem of toxic chemical identification, where a first responder 
may question victims of chemical exposure regarding the symptoms they experience. Chemicals that are 
inconsistent with the reported symptoms may then be eliminated. Given the importance of this problem, 
several organizations have developed extensive evidence-based databases (e.g., Wireless Information System 
for Emergency Responders (WISER) [10J) that record toxic chemicals and the acute symptoms which they 
are known to cause. Unfortunately, many symptoms tend to be nonspecific (e.g., nausea can be caused by 
many different chemicals), and it is therefore critical for the first responder to pose these questions in a 
sequence that leads to chemical identification in as few questions as possible. 

A well studied algorithm for query learning is known as the splitting algorithm [1] or generalized binary 
search (GBS) |EJ. This is a greedy algorithm which selects a query that most evenly divides the 
probability mass of the remaining objects [H [Til IE] • In this paper, we consider two important limitations 
of GBS and propose natural extensions inspired from an information theoretic perspective. 

First, we note that GBS is tailored to minimize the average number of queries needed to identify 9, 
thereby implicitly assuming that the incremental cost for each additional query is constant. However, 
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in certain applications, the cost of additional queries grows. For example, in time critical applications 
such as toxic chemical identification, each additional symptom queried impacts a first responder's ability 
to save lives. If some chemicals are less prevalent, GBS may require an unacceptably large number of 
queries to identify them. This problem is compounded when the prior probabilities m are inaccurately 
specified. To address these issues, we consider an objective function where the cost of querying grows 
exponentially in the number of queries. This objective function has been used earlier in the context of 
source coding for the design of prefix- free codes (discussed in Section [T7T] ) . We propose an extension of 
GBS that greedily optimizes this exponential cost function. The proposed algorithm is also intrinsically 
more robust to misspecification of the prior probabilities. 

Second, we consider the case where the object set G is partitioned into groups of objects and it is only 
necessary to identify the group to which the object belongs. This problem is once again motivated by toxic 
chemical identification where the appropriate response to a toxic chemical may only depend on the class 
of chemicals to which it belongs (pesticide, corrosive acid, etc.). As we explain below, a query learning 
algorithm such as GBS that is designed to rapidly identify individual objects is not necessarily efficient for 
group identification. Thus, we propose a natural extension of GBS for rapid group identification. Once 
again, we consider an objective where the cost of querying grows exponentially in the number of queries. 

1.1 Background and related work 

The goal of a standard query learning problem is to construct an optimal binary decision tree, where 
each internal node in the tree is associated with a query from the set Q, and each leaf node corresponds 
to an object from 0. Optimality is often with respect to the expected number of queries needed to 
identify 9, that is, the expected depth of the leaf node corresponding to the unknown object 9. In the 
special case when the query set Q is complete^ the problem of constructing an optimal binary decision 
tree is equivalent to constructing optimal variable-length binary prefix-free codes with minimum expected 
length. This problem has been widely studied in information theory with both Shannon [12] and Fano 
|13j independently proposing a top-down greedy strategy to construct suboptimal binary prefix codes, 
popularly known as Shannon-Fano codes. Huffman p3] derived a simple bottom-up algorithm to construct 
optimal binary prefix codes. A well known lower bound on the expected length of the optimal binary prefix 
codes is given by the Shannon entropy of II [15] . 

The problem of query learning when the query set Q is not complete has also been studied extensively 
in the literature with Garey |X6|, [T7] proposing an optimal dynamic programming based algorithm. This 
algorithm runs in exponential time in the worst case. Later, Hyafil and Rivest [18] showed that determining 
an optimal binary decision tree for this problem is NP-complete. Thereafter, various greedy algorithms 
[H HH [20] have been proposed to obtain a suboptimal binary decision tree. A widely studied solution is 
the splitting algorithm [4] or generalized binary search (GBS) 0[8]. Various bounds on the performance of 
this greedy algorithm have been established in [HOE]. We show below in Corollary [T] that GBS greedily 
minimizes the average number of queries, and thus weights each additional query by a constant. 

Here, we consider an alternate objective function where the cost grows exponentially in the number of 
queries. Specifically, the objective function is given by L\(Jl, d) := log A {^j^i^i^j ■> where A > 1 and 
d = (di, ■ ■ ■ ,c?m), di corresponding to the number of queries required to identify object 9i using a given 
tree. This cost function was proposed by Campbell [21] in the context of source coding for the design of 
binary prefix-free codes. It has also been used recently for the design of alphabetic codes [22] and random 
search trees [23] . 

Campbell |24] defines a generalized entropy function in terms of a coding problem and shows that the 
a-Renyi entropy, given by H a (n) = l°g2 ( Yli=i n i ) j can be characterized as 



1 A query set Q is said to be complete if for any SCO there exists a query q € Q such that either q — S or Q\q — S 




(1) 
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where a = 1+1 p g A and S is the set of all real distributions of d for which the Kraft's inequality X/i=i 2 di < 
1, is satisfied. For clarity, here we show the dependence of L\(Jl) on d, although later this dependence 
will not be made explicit. Note that the numbers di are not restricted to integer values in 0, hence the 
Renyi entropy merely provides a lower bound on the exponential cost function of any binary decision tree. 
In the special case when the query set Q is complete, it has been shown that an optimal binary decision 
tree (i.e., optimal binary prefix- free codes) that minimizes L\(T1) can be obtained by a modified version 
of the Huffman algorithm |25} 1261 [271 123] . However, when the query set Q is not complete, there does not 
exist an algorithm to the best of our knowledge that constructs a good suboptimal decision tree. 



1.2 Notation 

We denote a query learning problem by a pair (B , n) where B is a known M x N binary matrix with bij 
equal to 1 if 6i G qj, and otherwise. A decision tree T constructed on (B, n) has a query from the set Q 
at each of its internal nodes, with the leaf nodes terminating in the objects from Q. At each internal node 
in the tree, the objects that have reached that node are divided into two subsets, depending on whether 
they respond or 1 to the query, respectively. For a decision tree with L leaves, the leaf nodes are indexed 
by the set C = {1, • • • , L} and the internal nodes are indexed by the set X = {L + 1, ■ ■ ■ , 2L — 1}. At any 
internal node a S I, let 1(a), r(a) denote the "left" and "right" child nodes, and let 6 a C0 denote the set 
of objects that reach node 'a'. Thus, the sets &u a ) ^ ®a ; 6 r ( 8 ) Q ©a correspond to the objects in G a that 
respond and 1 to the query at node 'a', respectively. We denote by 7re a := S-fi^eGa} w i> the probability 
mass of the objects reaching node 'a' in the tree. Also, at any node 'a', the set Q a C Q corresponds to the 
set of queries that have been performed along the path from the root node up to node 'a'. 

We denote the a-Renyi entropy of a vector n = (iri, ■ ■ ■ ,ttm) by H a (jT) := log 2 (j2i=i ^f) ari< ^ 
its Shannon entropy by H(JT) := — ]T\ 7r« log 2 vr^, where we use the limit, lim7rlog 2 vr = to define the 

7T— >0 

limiting cases as TTi — > for any i. Using L'Hopital's rule, it can be seen that lim H a (Tl) = H(Ii). Also, 
we denote the Shannon entropy of a proportion it S [0, 1] by H(ir) := — 7rlog 2 vr — (1 — ir) log 2 (l — tt). 



2 Object Identification 

We begin with the basic query learning problem where the goal is to identify the unknown object 9 € in 
as few queries from Q as possible. We propose a family of greedy algorithms to minimize the exponential 
cost function i^(n) where A > 1. These algorithms are based on Theorem [TJ which provides an explicit 
formula for the gap in Campbell's lower bound. We also note that L\(I1) reduces to the average depth and 
the worst case depth in the limiting cases when A tends to one and infinity, respectively. In particular, 

M 

Lx(U) := limL A (n) = Vtt^ 

A — y 1 

i=i 

Loofn) := lim L\(U) = max di 

A^oo ie{l,-,M} 

where di denotes the number of queries required to identify object 9i in a given tree. In these limiting 
cases, the entropy lower bound on the cost function reduces to the Shannon entropy H (U) and log 2 M, 
respectively. 

Given a query learning problem (B,n), let 7"(B,n) denote the set of decision trees that can uniquely 
identify all the objects in the set G. 

Theorem 1. For any A > 1, the average exponential depth of the leaf nodes L\(H) in a tree T £ T(B,n) 
is given by 



L A (H) = log A A^ n )+5> 0a 



aeX 



(A - l)A d « - V a (Q a ) + ^P a (9, (fl) ) + ^D a (e r(a) ) 



(2) 
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where d a denotes the depth of any internal node 'a ' in the tree, Q a denotes the set of objects that reach 
node 'a', ir @a = £ n, « = t+i^a and V *{®a) ■= [E{i : ie e n } (^) 1 ■ 

Proof. Special case of Theorem [2] below. □ 

Theorem [T] provides an explicit formula for the gap in the Campbell's lower bound, namely, the term 
in summation over internal nodes X in Q. Using this theorem, the problem of finding a decision tree with 
minimum L\(J{) can be formulated as the following optimization problem: 

min log A f \ H °W + ^ \( X ~ ^ ~ V <*{&a) + ^-Pa(e, (a) ) + ^P a (9 r(a) ) 
TeT(B,n) y aex L 71-00 ^e. 

Since log^ is a monotonic increasing function and II is fixed for a given problem, the above optimization 
problem can be reduced to 



Tercel!) Z«z*e. [(A - l)A rf ° - V a (Q a ) + ^ V a (@ l(a) ) + ^gl 2? a (e p(fl) )j . (3) 



As we show in Section 2.1, this optimization problem is a generalized version of an optimization problem 



that is NP-complete. Hence, we propose a suboptimal approach to solve this optimization problem where we 
minimize the objective function locally instead of globally. We take a top-down approach and minimize the 
objective function by minimizing the term vr 0a [(A - l)\ d * - V a {Q a ) + -^V a (@ l{a) ) + -^V a (Q r(a) ) 
at each internal node, starting from the root node. Note that the terms that depend on the query chosen 
at node 'a' are ^e l(a) , JTe r((l) , £>a(@z( a )) an d £> Q (®r(a))- Hence, the objective function to be minimized at 

each internal node reduces to C a := w X? a (0;( a )) H — ^ PQ.(0 r ( a )). The algorithm, which we refer to 
as A-GBS, can be summarized as shown in Algorithm [TJ 

A-GBS 

Initialization : Let the leaf set consist of the root node, Q TOO t = 
while some leaf node 'a' has \Q a \ > 1 do 
for each query q £ Q \ Q a do 

Find 6*1(0,) an d @ r (a) produced by making a split with query q 
Compute the cost C a {q) of making a split with query q 
end 

Choose a query with the least cost C a at node 'a' 
Form child nodes 1(a), r(a) 
end 



Algorithm 1: Greedy decision tree algorithm for minimizing average exponential depth 

In the following two sections, we show that in the limiting case when A tends to one, where the average 
exponential depth reduces to the average linear depth, A-GBS reduces to GBS, and in the case when A 
tends to infinity, A-GBS reduces to GBS with the uniform prior 7Tj = 1/M. 

2.1 Average case 

We now present with an exact formula for the average number of queries £i(n) required to identify an 
unknown object 9 using a given tree, and show that GBS is a greedy algorithm to minimize this expression. 
First, we define a parameter called the reduction factor on the binary matrix/tree combination that provides 
a useful quantification of the cost function Li(n). 
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Definition 1 (Reduction factor). Let T be a decision tree constructed on the query learning problem 
(B, II) . The reduction factor at any internal node 'a ' in the tree is defined by p a = max{7re ;(a) , vre r(a) } / Tr@ a . 

Note that 0.5 < p a < 1. 

Corollary 1. The expected number of queries required to identify an unknown object using a tree T 6 
T(B, n) is given by 

Lx(n) = H(Tl) + 5> Qa [l - H( Pa )\ (4) 
where H(-) denotes the Shannon entropy. 

Proof. The result follows from Theorem [T] by taking the limit as A tends to 1 and applying L'Hopital's rule 
on both sides of @. □ 

This corollary re-iterates an earlier observation that the expected number of queries required to identify 
an unknown object using a tree T is bounded below by the Shannon entropy H(U). Besides, it presents 
the exact formula for the gap in this lower bound. It also follows from the above result that a tree attains 
this minimum value (i.e., Li(U) = H (II)) iff the reduction factors are equal to 0.5 at each internal node in 
the tree. 

Using this result, the problem of finding a decision tree with minimum Li(II) can be formulated as the 
following optimization problem: 

„ ™£ m ^( n ) + EaezWl " H(Pa)}- (5) 

Since II is fixed, this optimization problem reduces to minimizing ^ ag j7i"e a [l — H(p a )\ over the set of 
trees 7~(B,II). Note that this optimization problem is a special case of the optimization problem in ([3]). 
As mentioned earlier, finding a global optimal solution for this optimization problem is NP-complete |18j . 
Instead, we may take a top down approach and minimize the objective function by minimizing the term 
C a := 7re a [l — H(pa)] at each internal node, starting from the root node. Note that the only term that 
depends on the query chosen at node 'a' in this cost function is p a . Hence the algorithm reduces to 
minimizing p a (i.e., choosing a split as balanced as possible) at each internal node a € X. As a result, 
A-GBS reduces to GBS in this case. Finally, generalized binary search (GBS) is summarized in Algorithm 

El 



Generalized Binary Search (GBS) 

Initialization : Let the leaf set consist of the root node, Q roo t = 
while some leaf node c a' has \Q a \ > 1 do 
for each query q £ Q \ Q a do 

Find Qi(a) an d @ r (a) produced by making a split with query q 
Compute p a produced by making a split with query q 
end 

Choose a query with the least p a at node 'a' 
Form child nodes /(a),r(a) 
end 



Algorithm 2: Greedy decision tree algorithm for minimizing average depth 



2.2 Worst case 

Here, we present the other limiting case of the family of greedy algorithms A-GBS, A —> oo. As noted in 
Section [2j the exponential cost function L\ (n) reduces to the worst case depth of any leaf node in this 
case. Note that GBS with the uniform prior is an intuitive algorithm for minimizing the worst case depth. 
Here, we present a theoretical justification for the same. 



5 



Corollary 2. In the limiting case when A — > oo, the optimization problem 



minlog A \ -^-V a (Q l(a) ) + -^-P a (G r(o) ) ] minmax{|G z(a )|, |9 r(a) |} 



Proof. Applying L'Hopital's rule, we get 

lim log A ( -^-V a (@tr a )) + -^ L V a (@ r u- ) ) ) = max{log 2 |% a )|,log 2 \® r (a)\} 

Since log 2 is a monotonic increasing function, the optimization problem, minmax{log 2 |©z( a )|,log 2 |O r ( a )|} 
is equivalent to the optimization problem, minmax{|0/( a )|, |0 r ( a )|}. □ 

7T©., , 

Note that the cost function minimized at each internal node of a tree in A-GBS is C a := n ^ a Pa(Q;( a )) + 

7r na) P a (0 r ( Q )). Since log A is a monotonic function, this is equivalent to minimizing the function log A (C a ). 
We know from Corollary [2] that in the limiting case when A tends to infinity, this reduces to minimizing 
max{|0 i( - a )|, |0 r ( a )|}. Hence, in this limiting case, A-GBS reduces to GBS with uniform prior, thereby 
completely eliminating the dependence of the algorithm on the prior distribution n. More generally, as A 
increases, A-GBS becomes less sensitive to the prior distribution, and therefore more robust if the prior is 
misspecified. 



3 Group Identification 

In this section, we consider the problem of identifying the group of an unknown object 6 £ 0, rather than 
the object itself, with as few queries as possible. Here, in addition to the binary matrix B and a priori 
probability distribution n on the objects, the group labels for the objects are also provided, where the 
groups are assumed to be disjoint. 

We denote a query learning problem for group identification by (B,n,y), where y = (j/i, • • • ,Vm) 
denotes the group labels of the objects, y& G {1, • • ■ , in}. Let {0*}™ 1 be the partition of the object set 0, 
where 0* = {6k S : yk = i}- It is important to note here that the group identification problem cannot 
be simply reduced to a standard query learning problem with groups {0 , • • • ,0 m } as meta "objects," 
since the objects within a group need not respond the same to each query. For example, consider the toy 
example shown in Figure [T] where the objects 61,62 and 63 belonging to group 1 cannot be considered as 
one single meta object as these objects respond differently to queries q± and q^. 

In this context, we also note that GBS can fail to find a good solution for a group identification problem 
as it does not take the group labels into consideration while choosing queries. Once again, consider the toy 
example shown in Figure [l] where just one query (query g 2 ) is sufficient to identify the group of an unknown 
object, whereas GBS requires 2 queries to identify the group when the unknown object is either 62 or 64. 
Here, we propose a natural extension of A-GBS to the problem of group identification. Specifically, we 
propose a family of greedy algorithms that aim to minimize the average exponential cost for the problem 
of group identification. 

Note that when constructing a tree for group identification, a greedy, top-down algorithm, terminates 
splitting when all the objects at the node belong to the same group. Hence, a tree constructed in this 
fashion can have multiple objects ending in the same leaf node and multiple leaves ending in the same 
group. 

For a tree with L leaves, we denote by C l C C = {1, • • ■ , L} the set of leaves that terminate in group i. 
Similar to 0* C ©, we denote by O^ C Q a the set of objects belonging to group i that reach internal node 
a E I in the tree. 

Given (B, n, y), let 7~(B, n, y) denote the set of decision trees that can uniquely identify the groups of 
all objects in the set 0. For any decision tree T G 7~(B, n,y), let dj denote the depth of leaf node j G C 
Let random variable X denote the exponential cost incurred in identifying the group of an unknown object 
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Figure 1: Toy Example Figure 2: Decision tree constructed using GBS 



9 E 0. Then, the average exponential cost £ A (II) of identifying the group of the unknown object 9 using 
a given tree is defined as 



a La(h) = ^ Pr (0 e Q*^[ X \9 G 6*] 



i=l 



E 



A J 



j&Ci 71-01 



L A (n) = log A ^7r e * 



i=i 



7r ei 



In the limiting case when A tends to one and infinity, the cost function L\(U) reduces to 



Li(II) := limL A (II) = J> 0! 
A— >1 *■ — ' 



i=l 



E 



7T i 



L 00 (n) := lim £ A (II) = max dj. 



A— >oc 



ie-c 



Theorem 2. For any A > 1, the average exponential cost -L A (II) of identifying the group of an object using 
a tree T G 7~(B,n,y) is given by 



L A (n) = log A (\ H ^ ) + ^e a 



(A - l)A d " - V a (Q a ) + E^p^e ) + ^D a (6 r(a) ) 



(6) 



where Tl y = (vr@i, • • • , 7T0m) denotes the probability distribution of the object groups induced by the labels y 

l/a 

with a = i+id g2 A > ^e* = 12 K k and 7r 6 i = ^k- 

{k:y k =i} {k:d k eB a ,yk=i} 



andV a (O a ) := fe^ gf" 



Proof. See Appendix. 



□ 



Note that the definition of T> a (Q a ) in this theorem is a generalization of that in Theorem [TJ The above 
theorem states that given a query learning problem for group identification (B,II, y), the exponential cost 
function L\(U) is bounded below by the a-Renyi entropy of the probability distribution of the groups. It 
also explicitly states the gap in this lower bound. Note that Theorem [T] is a special case of this theorem 
where each group is of size 1 . 

Using Theorem [2j the problem of finding a decision tree with minimum cost function L A (II) can be 
formulated as the following optimization problem: 



Te mm niy) Zaex*e a [(A - 1)A*- - V a (Q a ) + ^V a (Q l{a) ) + ^V a (Q r(a) ) 



(7) 
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This optimization problem being the generalized version of the optimization problem in ([3]) is NP-complete. 
Hence, we propose a suboptimal approach to solve this optimization problem where we solve the objective 
function locally instead of globally. We take a top-down approach and minimize the objective function by 
minimizing the term C a := ^ e P a (Q;( a )) + ^ Pa(9 r (a)) & t each internal node, starting from the root 
node. The algorithm, which we refer to as A-GGBS, is summarized in Algorithm |3j 

A Group identification Generalized Binary Search (A-GGBS) 

Initialization : Let the leaf set consist of the root node, Qroot = 
while some leaf node 'a ' has more than one group of objects do 
for each query q £ Q \ Q a do 

Compute {©|( a )}^i an d {@*( a )}^i produced by making a split with query q 
Compute the cost C a {q) of making a split with query q 
end 

Choose a query with the least cost C a at node 'a' 
Form child nodes 1(a), r(a) 
end 

Algorithm 3: Greedy decision tree algorithm for group identification that minimizes average expo- 
nential cost 



3.1 Average case 



The interpretation of A-GGBS is somewhat easier in the limiting case when A tends to one. In addition 



to the reduction factor defined in Section 2.1 we define a new parameter called the group reduction factor 
for each group i € {1, • • • , m} at each internal node. 

Definition 2 (Group reduction factor). Let T be a decision tree constructed on the query learning 
problem for group identification (B,n,y). The group reduction factor for any group i at an internal node 
! a ' in the tree is defined by p l a = max{7r e i , 7r e ,: }/vr@i . 

1(a) r(a) a 

Corollary 3. The expected number of queries required to identify the group of an unknown object using a 
tree T £ 7~(B,n, y) is given by 



ael 



TT 

1 - H{p a ) + £ -^H(pl) 



(8) 



where Tl y = (vr@i, • • • , ttq™,) denotes the probability distribution of the object groups induced by the labels y 
and H(-) denotes the Shannon entropy. 

Proof. The result follows from Theorem [2] by taking the limit as A tends to 1 and applying L'Hopital's rule 
on both sides of ([6]). □ 

This corollary states that given a query learning problem for group identification (B, n, y), the expected 
number of queries required to identify the group of an unknown object is lower bounded by the Shannon 
entropy of the probability distribution of the groups. It also follows from the above result that this lower 
bound is achieved iff the reduction factor p a is equal to 0.5 and the group reduction factors {/O^I^Li are 
equal to 1 at every internal node in the tree. Also, note that the result in Corollary [T] is a special case of 
this result where each group is of size 1 leading to p\ = 1 for all groups at every internal node. 

Using this result, the problem of finding a decision tree with minimum Li(n) can be formulated as the 
following optimization problem: 



mm 

TeT(B,II,y) 



(9) 
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We propose a greedy top-down approach and minimize the objective function by minimizing the term 
7T0 a [1 —H(p a ) + YllLi j^H(Pa)] & t eacn internal node, starting from the root node. Note that the terms 
that depend on the query chosen at node 'a' are p a and p l a . Hence the algorithm reduces to minimizing 
C a '■= 1 — H{p a ) + YaLi ^f^^iPa) a * each internal node a G X. Note that this objective function consists 
of two terms, the first term [1 — H(p a )} favors queries that evenly distribute the probability mass of the 
objects at node 'a' to its child nodes (regardless of the group) while the second term ^ ■^ L H(p l a ) favors 
queries that transfer an entire group of objects to one of its child nodes. The algorithm, which we refer to 
as GGBS, is summarized in Algorithm |4j 



Group identification Generalized Binary Search (GGBS) 


Initialization : Let the leaf set consist of the root node, Q roo t = 


while some leaf node 'a ' has more than one group of objects do 




for each query q E Q\ Q a do 






Compute {p a }^L\ and p a produced by making a split with query q 






Compute the cost C a (q) of making a split with query q 




end 




Choose a query with the least cost C a at node 'a' 




Form child nodes 1(a), r (a) 


end 





Algorithm 4: Greedy decision tree algorithm for group identification that minimizes average linear 



cost 

There is an interesting connection between the above algorithm and impurity-based decision tree induc- 
tion. In particular, the above algorithm is equivalent to the decision tree splitting algorithm used in C4.5 
software package [28], based on the entropy impurity measure. See |29j for more details on this relation. 



3.2 Worst case 

We now present A-GGBS in the limiting case when A tends to infinity. As noted in Section[3j the exponential 
cost function L\ (II) reduces to the worst case depth of any leaf node in this limiting case. Let N a denote 
the number of groups at any node 'a' in the tree, i.e., N a = \{i 6 {1, • • • ,m} : Q l a ^ 0}|. 

Corollary 4. In the limiting case when A — > oo, the optimization problem 

( ) n ®r( ) \ 

minlog A -V a (Q l{a) ) H —V a (Q r ^ a) ) — > minmax{iV /(a) , N r{a) } 

where V a (@ a ) = YJiLi 



Proof. Applying L'Hopital's rule, we get 



lim log A ( -^^V a (Q l(a) ) + -^^V a [@ r{a) ) ) = max{log 2 iV /{a) ,log 2 iV r(a) } 

A^oo V 7T0 a 7Te a / 

Since log 2 is a monotonic increasing function, the optimization problem, minmax{log 2 ^Vj( ), log 2 ^ r (a)} i s 
equivalent to the optimization problem, minmax{iV;( a ), N r ^}. □ 



Note that the cost function minimized at each internal node of a tree in A-GGBS is C a := e '(") T> a (®i( a \ )+ 
£> a (0 r ( a )). Since log A is a monotonic function, this is equivalent to minimizing the function log A (C a ). 



We know from Corollary [4] that in the limiting case when A tends to infinity, this reduces to minimizing 
max{iV/( a ), iV r ( a )} at each internal node in the tree. 
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Figure 3: Experiments to demonstrate the improved performance of X-GBS over GBS and GBS with uniform prior. 
The plots in the first column correspond to the WISER database and those in the second column correspond to 
synthetic data. 

4 Experiments 

We compare the proposed algorithms with GBS on both synthetic data and a real dataset known as 
WISER, which is a toxic chemical database describing the binary relationship between 298 toxic chemicals 
and 79 acute symptoms. We only present results for object (as opposed to group) identification. Figure [3] 
demonstrates the improved performance of A-GBS over standard GBS, and GBS with uniform prior, over 
a range of A values. Each curve corresponds to the average value of the cost function L\(IJ) as a function 
of A over 100 repetitions. 

The plots in the first column correspond to the WISER database, which has been studied in more detail 
in [29]. Here, in each repetition, the prior is generated according to Zipf's law, i.e., (k~P / YliLi 
j3 > 0, after randomly permuting the objects. Note that in the special case, when /3 = 0, this reduces to 
the uniform distribution and as /3 increases, it tends to a skewed distribution with most of the probability 
mass concentrated on a single object. 

The plots in the second column correspond to synthetic data based on an active learning application. 
We consider a two-dimensional setting where the classifiers are restricted to be linear classifiers of the form 
sign(xi — c), sign{c — Xj), where i = 1, 2 and c takes on 25 distinct values. The number of distinct classifiers 
is therefore 100, and the number of queries is 26 2 = 676. The goal is to identify the classifier by selecting 
queries judiciously. Here, the prior is generated such that the classifiers that are close to %i = are more 
likely than the ones away from the axes, with their relative probability decreasing according to Zipf's law 
fc-^, P > 0. Hence, the prior is the same in each repetition. However, the randomness in each repetition 
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comes from the greedy algorithms due to the presence of multiple best splits at each internal node. Note 
that in all the experiments, A-GBS performs better than GBS and GBS with uniform prior. We also see 
that A-GBS converges to GBS as A — > 1 and to GBS with uniform prior as A — > oo. 

5 Conclusions 

In this paper, we show that generalized binary search (GBS) is a greedy algorithm to optimize the ex- 
pected number of queries needed to identify an object. We develop two extensions of GBS, motivated by 
the problem of toxic chemical identification. First, we derive a greedy algorithm, A-GBS, to minimize the 
expected exponentially weighted query cost. The average and worst cases fall out in the limits as A — > 1 
and A — > oo, and correspond to GBS and GBS with uniform prior, respectively. Second, we suppose the 
objects are partitioned into groups, and the goal is to identify only the group of the unknown object. Once 
again, we propose a greedy algorithm, A-GGBS, to minimize the expected exponentially weighted query 
cost. The algorithms are derived in a common framework. In particular, we prove exact formulas for the 
exponentially weighted query cost that close the gap between previously known lower bounds related to 
Renyi entropy. These exact formulas are then optimized in a greedy, top-down manner to construct a 
decision tree. An interesting open question is to relate these greedy algorithms to the global optimizer of 
the exponentially weighted cost function. 

Acknowledgments: G. Bellala and C. Scott were supported in part by NSF Awards No. 0830490 and 
0953135. S. Bhavnani was supported in part by NIH grant No. UL1RR024986. The authors would like to 
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A Proof of Theorem [2] 

Define two new functions L\ and H a as 



A - 1 











=E 7 % 


E^ fc 






k=0 



H a :=l 



Noting that the cost function L\(I1) can be written as, 



L A (n) = i ogA hr^.A^ 

the new function L\ can be related to the cost function L\{Ii) as 

X Lm = (A _ + 1 

Similarly, H a is related to the a-Renyi entropy H a (Jly) as 



H ^ n ^ = tzt^ lo §2 E = r]^r lo g2 E ^e* = 1o sa E ^ 

i=l S2 i=l \i=l / 

l 



vt=l 



where we use the definition of a, i.e., a = i + iog 2 \ i n (Ha). 

Now, we note from Lemma [l] that L\ can be decomposed as 



£ a = E xda 



ael 



ael 



(10) 



(11a) 
(lib) 



(12) 



where d a denotes the depth of internal node 'a' in the tree T. Similarly, note from Lemma [2] that H a can 
be decomposed as 



Kq> ) a ael 



A H Q (n y ) =i + J2 [7r & V a (Q a ) - ire lia) V a (Q l{a) ) - ire r(a) V a (Q r{a) 



(13) 



ael 



Finally, the result follows from (12) and (13) above. 



Lemma 1. The function L\ can be decomposed over the internal nodes in a tree T, as 

L x = ^X d ^e a 

ael 

where d a denotes the depth of internal node a £ I and 7re a is the probability mass of the objects at that 
node. 
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Proof. Let T a denote a subtree from any internal node 'a' in the tree T and let Z a ,C a denote the set of 
internal nodes and leaf nodes in the subtree T a , respectively. Then, define L'i in the subtree T a to be 



ja 

A 



E 



we. 



j&Ca 7r Qa 



E 



fc=0 A 



where denotes the depth of leaf node j E C a in the subtree T a . 

Now, we show using induction that for any subtree T a in the tree T, the following relation holds 

*e.L% = E A<7r °* 



(14) 



where denotes the depth of internal node s £ I a in the subtree T a . 

The relation holds trivially for any subtree T a rooted at an internal node a 6 X whose both child nodes 
terminate as leaf nodes, with both the left hand side and the right hand side of the expression equal to 
7T0 a . Now, consider a subtree T a rooted at an internal node a £ I whose left child (or right child) alone 
terminates as a leaf node. Assume that the above relation holds true for the subtree rooted at the right 
child of node 'a'. Then, 



7T© 



9a L \ = E 



Ea* 

k=0 



E ^ + E 



Tie, 



{ie£ a :d?=i} 



{j£C a :d<i>l} 



E A * 

fc=0 



7T0 



Ha) 



+ E 



df-2 



i + aEa* 



fc=0 



7T0 



e» + A E 



,r(a) 



E Afc 



fe=0 



7T0 



e a + A E Xd 



r(a) 



sex, 



r(a) 



where the last step follows from the induction hypothesis. Finally, consider a subtree T a rooted at an 
internal node a £ I whose neither child node terminates as a leaf node. Assume that the relation in ( 14 ) 
holds true for the subtrees rooted at its left and right child nodes. Then, 



7T 



a L X - E 



jec a 



■dj-l 



E 



E A 

k=0 

i + aEa 



a) 



d?-2 



k=0 



+ E ^ 



7T0 



.+* E 



,i(a) 



E 

fc=0 



+ ^ E 



7T0 



i + aEa 

fc=0 

"d r(a) -l 

E Afc 



fc=0 



7I"G Q + A 



A^ 7T0 a + E A %Q s 



sex l 



(a) 



sex. 



r(a) 



E A< -e a 
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thereby completing the induction. Finally, the result follows by applying the relation in ( 14 ) to the tree T 
whose probability mass at the root node, 7re a = 1. □ 

Lemma 2. The function H a can be decomposed over the internal nodes in a tree T, as 
Ha = " r J2 Ve>aPcx(®a) - 7Te i(a) £> a (e /{a) ) - 7T0 £> a (G r(a) ) 

(E£i*S*)"aer L 



where V a (e a ) := (M V ' 
node a £ I. 



and irQ a denotes the probability mass of the objects at any internal 



Proof. Let T a denote a subtree from any internal node 'a' in the tree T and let X a denote the set of internal 
nodes in the subtree T a . Then, define H® in a subtree T a to be 



H% = \ 



E? 



Now, we show using induction that for any subtree T a in the tree T, the following relation holds 



,i=i 



H a = l^sVaiQs) - 7r ei{s) V a (e l{s) ) - 7Te r(s) V a (e r{s) ) 

S&X a 



(15) 



Note that the relation holds trivially for any subtree T a rooted at an internal node a £ I whose both 
child nodes terminate as leaf nodes. Now, consider a subtree T a rooted at any other internal node a € X. 
Assume the above relation holds true for the subtrees rooted at its left and right child nodes. Then, 



i=l 



Tja 



i=l 
m 

i=l 
m 



7r e„ 



^(a) - f 9r W 



8=1 



+ 



.i=i 
i 



e 



1(a) 



1(a) 



+ 



i 

m 



r(a) 



e 



1 = 1 



r(a) 



7T0 



r(a) 



[7T0 a ^a(@a) ~ 7Te i(a) T>a (@Z(a) ) ~ ^e r{a) V a (&r(a) ) 



+ 



7 K a ) 



i=l 



+ 



7 



i=l 



where the last step follows from the induction hypothesis. Finally, the result follows by applying the 
relation in (15) to the tree T. □ 
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